{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<a href=\"https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv\" target=\"_blank\"><img align=\"left\" src=\"data/cover.jpg\" style=\"width: 76px; height: 100px; background: white; padding: 1px; border: 1px solid black; margin-right:10px;\"></a>\n",
    "*This notebook contains an excerpt from the book [Machine Learning for OpenCV](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv) by Michael Beyeler.\n",
    "The code is released under the [MIT license](https://opensource.org/licenses/MIT),\n",
    "and is available on [GitHub](https://github.com/mbeyeler/opencv-machine-learning).*\n",
    "\n",
    "*Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations.\n",
    "If you find this content useful, please consider supporting the work by\n",
    "[buying the book](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv)!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Combining Different Algorithms Into an Ensemble](10.00-Combining-Different-Algorithms-Into-an-Ensemble.ipynb) | [Contents](../README.md) | [Combining Decision Trees Into a Random Forest](10.02-Combining-Decision-Trees-Into-a-Random-Forest.ipynb) >"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Understanding Ensemble Methods\n",
    "\n",
    "The goal of ensemble methods is to combine the predictions of several individual estimators\n",
    "built with a given learning algorithm in order to solve a shared problem. Typically, an\n",
    "ensemble consists of two major components:\n",
    "- a set of models\n",
    "- a set of decision rules that govern how the results of these models are combined into a single output\n",
    "\n",
    "A consequence of this procedure is that we get a multitude of opinions about any given\n",
    "problem. So how do we know which classifier is right?\n",
    "\n",
    "This is why we need a **decision rule**. Perhaps we consider everybody's opinion of equal\n",
    "importance, or perhaps we would want to weight somebody's opinion based on their expert\n",
    "status. Depending on the nature of our decision rule, ensemble methods can be categorized\n",
    "as follows:\n",
    "- **Averaging methods**: They develop models in parallel and then use averaging or voting techniques to come up with a combined estimator. This is as close to democracy as ensemble methods can get.\n",
    "- **Boosting methods**: They involve building models in sequence, where each added model aims to improve the score of the combined estimator. This is akin to debugging the code of your intern or reading the report of your undergraduate student: they are all bound to make errors, and the job of every subsequent expert laying eyes on the topic is to figure out the special cases where the preceding expert got it wrong.\n",
    "- **Stacking methods**: Also known as **blending methods**, they use the weighted output of multiple classifiers as inputs to the next layer in the model. This is akin to having expert groups who pass on their decision to the next expert group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding average ensembles\n",
    "\n",
    "An averaging ensemble is essentially a collection of models that train on the same dataset.\n",
    "Their results are then aggregated in a number of ways.\n",
    "\n",
    "One common method involves creating multiple model configurations that take different\n",
    "parameter subsets as input. Techniques that take this approach are referred to collectively\n",
    "as bagging methods.\n",
    "\n",
    "Bagging methods come in many different flavors. However, they typically only differ by the\n",
    "way they draw random subsets of the training set:\n",
    "- **Pasting** methods draw random subsets of the samples without replacement\n",
    "- **Bagging** methods draw random subsets of the samples with replacement\n",
    "- **Random subspace** methods draw random subsets of the features but train on all data samples\n",
    "- **Random patches** methods draw random subsets of both samples and features\n",
    "\n",
    "In scikit-learn, bagging methods can be realized using the meta-estimators\n",
    "`BaggingClassifier` and `BaggingRegressor`. These are meta-estimators because they\n",
    "allow us to build an ensemble from any other base estimator."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing a bagging classifier\n",
    "\n",
    "We can, for instance, build an ensemble from a collection of 10 *k*-NN classifiers as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import BaggingClassifier\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "bagging = BaggingClassifier(KNeighborsClassifier(), n_estimators=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The BaggingClassifier class provides a number of options to customize the ensemble:\n",
    "- n_estimators: As shown in the preceding code, this specifies the number of base estimators in the ensemble.\n",
    "- max_samples: This denotes the number (or fraction) of samples to draw from the dataset to train each base estimator. We can set bootstrap=True to sample with replacement (effectively implementing bagging), or we can set bootstrap=False to implement pasting.\n",
    "- max_features: This denotes the number (or fraction) of features to draw from the feature matrix to train each base estimator. We can set `max_samples`$=1.0$ and `max_features`$<1.0$ to implement the random subspace method. Alternatively, we can set both `max_samples`$<1.0$ and `max_features`$<1.0$ to implement the random patches method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, if we wanted to implement bagging with 10 $k$-NN classifiers with $k=5$, where\n",
    "every $k$-NN classifier is trained on 50% of the samples in the dataset, we would modify the\n",
    "preceding command as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "bag_knn = BaggingClassifier(KNeighborsClassifier(n_neighbors=5),\n",
    "                            n_estimators=10, max_samples=0.5,\n",
    "                            bootstrap=True, random_state=3) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to observe a performance boost, we have to apply the ensemble to some dataset,\n",
    "such as the breast cancer dataset from [Chapter 5](05.00-Using-Decision-Trees-to-Make-a-Medical-Diagnosis.ipynb), *Using Decision Trees to Make a Medical\n",
    "Diagnosis*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_breast_cancer\n",
    "dataset = load_breast_cancer()\n",
    "X = dataset.data\n",
    "y = dataset.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, random_state=3\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.93706293706293708"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bag_knn.fit(X_train, y_train)\n",
    "bag_knn.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The performance boost will become evident once we also train a single $k$-NN classifier on\n",
    "the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.91608391608391604"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "knn = KNeighborsClassifier(n_neighbors=5)\n",
    "knn.fit(X_train, y_train)\n",
    "knn.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Without changing the underlying algorithm, we were able to improve our test score from\n",
    "91.6% to 93.7% by simply letting 10 k-NN classifiers do the job instead of a single one.\n",
    "\n",
    "You're welcome to experiment with other bagging ensembles.\n",
    "\n",
    "For example, in order to change the above code to implement the random patches method, add `max_features=xxx` to the `BaggingClassifier` call in `In [2]`, where `xxx` is a number or fraction of features you want each base estimator to train on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing a bagging regressor\n",
    "\n",
    "Similarly, we can use the BaggingRegressor class to form an ensemble of regressors:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import BaggingRegressor\n",
    "from sklearn.tree import DecisionTreeRegressor\n",
    "bag_tree = BaggingRegressor(DecisionTreeRegressor(),\n",
    "                           max_features=0.5, n_estimators=10, \n",
    "                           random_state=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, we could build an ensemble of decision trees to predict the housing prices\n",
    "from the Boston dataset of [Chapter 3](03.00-First-Steps-in-Supervised-Learning.ipynb), *First Steps in Supervised Learning*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_boston\n",
    "dataset = load_boston()\n",
    "X = dataset.data\n",
    "y = dataset.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, random_state=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can fit the bagging regressor on `X_train` and score it on `X_test`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.82704756225081688"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bag_tree.fit(X_train, y_train)\n",
    "bag_tree.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in the preceding example, we find a performance boost of roughly 5%, from 77.3%\n",
    "accuracy for a single decision tree to 82.7% accuracy.\n",
    "\n",
    "Of course, we wouldn't just stop here. Nobody said the ensemble needs to consist of 10\n",
    "individual estimators, so we are free to explore different-sized ensembles. On top of that,\n",
    "the `max_samples` and `max_features` parameters allow for a great deal of customization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding boosting ensembles\n",
    "\n",
    "Another approach to building ensembles is through boosting. Boosting models use multiple\n",
    "individual learners in sequence to iteratively boost the performance of the ensemble.\n",
    "\n",
    "Typically, the learners used in boosting are relatively simple. A good example is a decision\n",
    "tree with only a single node—a decision stump. Another example could be a simple linear\n",
    "regression model. The idea is not to have the strongest individual learners, quite the\n",
    "opposite—we want the individuals to be weak learners, so that we get a superior\n",
    "performance only when we consider a large number of individuals."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing a boosting classifier\n",
    "\n",
    "For example, we can build a boosting classifier from a collection of 10 decision trees as\n",
    "follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import GradientBoostingClassifier\n",
    "boost_class = GradientBoostingClassifier(n_estimators=10,\n",
    "                                         random_state=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These classifiers support both binary and multiclass classification.\n",
    "\n",
    "Similar to the `BaggingClassifier` class, the `GradientBoostingClassifier` class\n",
    "provides a number of options to customize the ensemble:\n",
    "- `n_estimators`: This denotes the number of base estimators in the ensemble. A large number of estimators typically results in better performance.\n",
    "- `loss`: This denotes the loss function (or cost function) to be optimized. Setting `loss='deviance'` implements logistic regression for classification with probabilistic outputs. Setting `loss='exponential'` actually results in AdaBoost, which we will talk about in a little bit.\n",
    "- `learning_rate`: This denotes the fraction by which to shrink the contribution of each tree. There is a trade-off between `learning_rate` and `n_estimators`.\n",
    "- `max_depth`: This denotes the maximum depth of the individual trees in the ensemble.\n",
    "- `criterion`: This denotes the function to measure the quality of a node split.\n",
    "- `min_samples_split`: This denotes the number of samples required to split an internal node.\n",
    "- `max_leaf_nodes`: This denotes the maximum number of leaf nodes allowed in each individual tree and so on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can apply the boosted classifier to the preceding breast cancer dataset to get an idea of\n",
    "how this ensemble compares to a bagged classifier. But first, we need to reload the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dataset = load_breast_cancer()\n",
    "X = dataset.data\n",
    "y = dataset.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, random_state=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we find that the boosted classifier achieves 94.4% accuracy on the test set—a little\n",
    "under 1% better than the preceding bagged classifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.94405594405594406"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "boost_class.fit(X_train, y_train)\n",
    "boost_class.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We would expect an even better score if we increased the number of base estimators from\n",
    "10 to 100. In addition, we might want to play around with the learning rate and the depths\n",
    "of the trees."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing a boosting regressor\n",
    "\n",
    "Implementing a boosted regressor follows the same syntax as the boosted classifier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import GradientBoostingRegressor\n",
    "boost_reg = GradientBoostingRegressor(n_estimators=10,\n",
    "                                      random_state=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have seen earlier that a single decision tree can achieve 79.3% accuracy on the Boston\n",
    "dataset. A bagged decision tree classifier made of 10 individual regression trees achieved\n",
    "82.7% accuracy. But how does a boosted regressor compare?\n",
    "\n",
    "Let's reload the Boston dataset and split it into training and test sets. We want to make sure\n",
    "we use the same value for `random_state` so that we end up training and testing on the\n",
    "same subsets of the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "dataset = load_boston()\n",
    "X = dataset.data\n",
    "y = dataset.target"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, random_state=3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As it turns out, the boosted decision tree ensemble actually performs worse than the\n",
    "previous code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.71991199075668488"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "boost_reg.fit(X_train, y_train)\n",
    "boost_reg.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This result might be confusing at first. After all, we used 10 times more classifiers than we\n",
    "did for the single decision tree. Why would our numbers get worse?\n",
    "\n",
    "You can see, this is a good example of an expert classifier being smarter than a group of\n",
    "weak learners. One possible solution is to make the ensemble larger. In fact, it is customary\n",
    "to use on the order of 100 weak learners in a boosted ensemble:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "boost_reg = GradientBoostingRegressor(n_estimators=100,\n",
    "                                      random_state=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, when we retrain the ensemble on the Boston dataset, we get a test score of 89.8%:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.89984081091774459"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "boost_reg.fit(X_train, y_train)\n",
    "boost_reg.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What happens when you increase the number to `n_estimators=500`? There's a lot more\n",
    "we could do by playing with the optional parameters.\n",
    "\n",
    "As you can see, boosting is a powerful procedure that allows you to get massive\n",
    "performance improvements by combining a large number of relatively simple learners."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Combining Different Algorithms Into an Ensemble](10.00-Combining-Different-Algorithms-Into-an-Ensemble.ipynb) | [Contents](../README.md) | [Combining Decision Trees Into a Random Forest](10.02-Combining-Decision-Trees-Into-a-Random-Forest.ipynb) >"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
